Comparative genomics reveals a unique nitrogen-carbon balance system in Asteraceae

The Asteraceae (daisy family) is one of the largest families of plants. The genetic basis for its high biodiversity and excellent adaptability has not been elucidated. Here, we compare the genomes of 29 terrestrial plant species, including two de novo chromosome-scale genome assemblies for stem lettuce, a member of Asteraceae, and Scaevola taccada, a member of Goodeniaceae that is one of the closest outgroups of Asteraceae. We show that Asteraceae originated ~80 million years ago and experienced repeated paleopolyploidization. PII, the universal regulator of nitrogen-carbon (N-C) assimilation present in almost all domains of life, has conspicuously lost across Asteraceae. Meanwhile, Asteraceae has stepwise upgraded the N-C balance system via paleopolyploidization and tandem duplications of key metabolic genes, resulting in enhanced nitrogen uptake and fatty acid biosynthesis. In addition to suggesting a molecular basis for their ecological success, the unique N-C balance system reported for Asteraceae offers a potential crop improvement strategy.

The exact sample size (n) for each experimental group/condition, given as as a discrete number and unit of of measurement A statement on on whether measurements were taken from distinct samples or or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of of all covariates tested A description of of any assumptions or or corrections, such as as tests of of normality and adjustment for multiple comparisons A full description of of the statistical parameters including central tendency (e.g. means) or or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or or associated estimates of of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of of freedom and P value noted Give P values as exact values whenever suitable.
For Bayesian analysis, information on on the choice of of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of of the appropriate level for tests and full reporting of of outcomes Estimates of of effect sizes (e.g. Cohen's d, Pearson's r), ), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above. For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policy

Human research participants
Policy information about studies involving human research participants and Sex and Gender in Research.
Reporting on sex and gender The sequencing data used in this study, assembled chromosomes, unplaced scaffolds, and annotations have been deposited into the Genome Sequence Archive (GSA) and Genome Warehouse (GWH) database in the BIG Data Center (https://bigd.big.ac.cn/gsa/index.jsp) under accession code PRJCA007442. Annotated information on stem lettuce in detail can also be found in LettuceGDB (https://lettucegdb.com/). Additional files such as the customized repeat library, gene trees and phylogenetic trees have been uploaded to Zenodo (https://zenodo.org/record/8058114). Source data are provided with this paper.

March 2021
Note that full information on the approval of the study protocol must also be provided in the manuscript.

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative. The genomes of two samples (lettuce and Scaevola taccada) were sequenced. The leaf, root, stem and flower tissues for the two species for transcriptome sequencing in three biological replicates. For the comparative genomic anlaysis, a total of 29 representative plant species were selected.
For the Illunima sequencing reads, we removed the adapter sequences and filtered out the low-quality reads.
For the genome assembly, we searched for the bacterial database to rule out the possible contamination. Additionally, the sequence from the organellar genomes were also excluded in the contigs.
Replications were used in the plant transformation, detection of metabolites, RNA-seq and RT-qPCR. Moreover, various bootstraps replicates were used in the phylogenetic analysis.
No randomization were involved in our study.
No experiment involved in the blinding were carried out.
HRP-conjugated GST-tag, Mouse mAb (Yeasen, #30903ES10) MBP-tag, Rabbit pAb (Yeasen, #31201ES20) Goat Anti-Rabbit Mouse lgG-HRP (Abmart,#M21003S) All primary antibodies used in this study were commercially purchased and validation was performed by the individual companies. Validation data for the specific application is present on the data sheets provided by the company websites.
Flow Cytometry Plots Confirm that: The axis labels state the marker and fluorochrome used (e.g. CD4-FITC).
The axis scales are clearly visible. Include numbers along axes only for bottom left plot of of group (a (a 'group' is is an an analysis of of identical markers).
All plots are contour plots with outliers or or pseudocolor plots.
A numerical value for number of of cells or or percentage (with statistics) is is provided.

Methodology
Sample preparation Instrument Software Cell population abundance

Gating strategy
Tick this box to to confirm that a figure exemplifying the gating strategy is is provided in in the Supplementary Information.
Tender leaves were collected from the sequenced Sc. taccada plant and analysed using a flow cytometer. Populus trichocarpa (2n=2x=38) and tomato (Solanum lycopersicum) (2n=2x=24) samples were analysed to to serve as as the genome size reference.
Over 5,000 nuclei per sample were collected and detected.
Filter-625/26 was used in in gating. The FL3-H/SSC-H gate method was used to to eliminate the debris and cell fragments.